open-source framework
LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models
Carta, Salvatore Mario, Chessa, Stefano, Contu, Giulia, Corriga, Andrea, Deidda, Andrea, Fenu, Gianni, Frigau, Luca, Giuliani, Alessandro, Grassi, Luca, Manca, Marco Manolo, Marras, Mirko, Mola, Francesco, Mossa, Bastianino, Mura, Piergiorgio, Ortu, Marco, Piano, Leonardo, Pisano, Simone, Pisu, Alessia, Podda, Alessandro Sebastian, Pompianu, Livio, Seu, Simone, Tiddia, Sandro Gabriele
Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies.
- Europe > Italy > Sardinia > Cagliari (0.04)
- North America > United States > New Mexico (0.04)
- North America > United States > Michigan (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- (3 more...)
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
Jaghouar, Sami, Ong, Jack Min, Hagemann, Johannes
OpenDiLoCo is an open-source implementation and replication of the Distributed Low-Communication (DiLoCo) training method for large language models. We provide a reproducible implementation of the DiLoCo experiments, offering it within a scalable, decentralized training framework using the Hivemind library. We demonstrate its effectiveness by training a model across two continents and three countries, while maintaining 90-95% compute utilization. Additionally, we conduct ablations studies focusing on the algorithm's compute efficiency, scalability in the number of workers and show that its gradients can be all-reduced using FP16 without any performance degradation. Furthermore, we scale OpenDiLoCo to 3x the size of the original work, demonstrating its effectiveness for billion parameter models.
- Information Technology > Software (0.60)
- Information Technology > Artificial Intelligence > Natural Language (0.53)
CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models
Sheikh, Zaid, Anastasopoulos, Antonios, Rijhwani, Shruti, Tjuatja, Lindia, Jimerson, Robbie, Neubig, Graham
Effectively using Natural Language Processing (NLP) tools in under-resourced languages requires a thorough understanding of the language itself, familiarity with the latest models and training methodologies, and technical expertise to deploy these models. This could present a significant obstacle for language community members and linguists to use NLP tools. This paper introduces the CMU Linguistic Annotation Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models. CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages, even with limited training data. We describe various tools and APIs that are currently available and how developers can easily add new models/functionality to the framework. Code is available at https://github.com/neulab/cmulab along with a live demo at https://cmulab.dev
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York > Erie County > Buffalo (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (5 more...)
OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning
Ma, Jiaqi, Lai, Vivian, Zhang, Yiming, Chen, Chacha, Hamilton, Paul, Ljubenkov, Davor, Lakkaraju, Himabindu, Tan, Chenhao
Recently, there has been a surge of explainable AI (XAI) methods driven by the need for understanding machine learning model behaviors in high-stakes scenarios. However, properly evaluating the effectiveness of the XAI methods inevitably requires the involvement of human subjects, and conducting human-centered benchmarks is challenging in a number of ways: designing and implementing user studies is complex; numerous design choices in the design space of user study lead to problems of reproducibility; and running user studies can be challenging and even daunting for machine learning researchers. To address these challenges, this paper presents OpenHEXAI, an open-source framework for human-centered evaluation of XAI methods. OpenHEXAI features (1) a collection of diverse benchmark datasets, pre-trained models, and post hoc explanation methods; (2) an easy-to-use web application for user study; (3) comprehensive evaluation metrics for the effectiveness of post hoc explanation methods in the context of human-AI decision making tasks; (4) best practice recommendations of experiment documentation; and (5) convenient tools for power analysis and cost estimation. OpenHEAXI is the first large-scale infrastructural effort to facilitate human-centered benchmarks of XAI methods. It simplifies the design and implementation of user studies for XAI methods, thus allowing researchers and practitioners to focus on the scientific questions. Additionally, it enhances reproducibility through standardized designs. Based on OpenHEXAI, we further conduct a systematic benchmark of four state-of-the-art post hoc explanation methods and compare their impacts on human-AI decision making tasks in terms of accuracy, fairness, as well as users' trust and understanding of the machine learning model.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > North Carolina (0.04)
- (2 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.69)
- Research Report > New Finding (0.68)
Agents: An Open-source Framework for Autonomous Language Agents
Zhou, Wangchunshu, Jiang, Yuchen Eleanor, Li, Long, Wu, Jialong, Wang, Tiannan, Qiu, Shi, Zhang, Jintian, Chen, Jing, Wu, Ruipu, Wang, Shuai, Zhu, Shiding, Chen, Jiyu, Zhang, Wentao, Tang, Xiangru, Zhang, Ningyu, Chen, Huajun, Cui, Peng, Sachan, Mrinmaya
Recent advances on large language models (LLMs) enable researchers and developers to build autonomous language agents that can automatically solve various tasks and interact with environments, humans, and other agents using natural language interfaces. We consider language agents as a promising direction towards artificial general intelligence and release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience. Agents is carefully engineered to support important features including planning, memory, tool usage, multi-agent communication, and fine-grained symbolic control. Agents is user-friendly as it enables non-specialists to build, customize, test, tune, and deploy state-of-the-art autonomous language agents without much coding. The library is also research-friendly as its modularized design makes it easily extensible for researchers. Agents is available at https://github.com/aiwaves-cn/agents.
aoip.ai: An Open-Source P2P SDK
Konan, Joseph, Agnihotri, Shikhar, Hsieh, Chia-Chun
This white paper introduces aoip.ai, a groundbreaking open-source SDK incorporating peer-to-peer technology and advanced AI integration to transform VoIP and IoT applications. It addresses key market challenges by enhancing data security, elevating communication quality, and providing greater flexibility for developers and users. Developed in collaboration with Carnegie Mellon University, aoip.ai sets a new standard for decentralized and democratized communication solutions.
Seldon gears up with $20M to help businesses accelerate adoption of machine learning -- TFN
Seldon, a London-based data-centric machine learning operations (MLOps) platform, has secured a $20M Series B funding round led by new Portuguese investor Bright Pixel (former Sonae IM) with participation from existing investors AlbionVC (backed Ophelos), Cambridge Innovation Capital, and Amadeus Capital Partners. The funding will help Seldon expand its machine learning product's market fit and unlock enterprise-ready solutions based on open source. "AI is in everything, and Seldon is uniquely positioned to ensure a return on ML investment by providing robust, scalable, and secure infrastructure, pioneering a data-centric approach to ML pipelines, prioritizing team collaboration across the organization, and making sure teams can solve meaningful problems at scale by building trust in machine learning, even under the most intense regulatory conditions. "We're excited to bring together new investor Bright Pixel Capital and our existing partners, who believe in our vision and can help us become the trusted MLOps partner of any organization worldwide." Currently, numerous companies are investing a lot of resources into artificial intelligence, but they are having difficulty expanding their models for practical use. This is due to bottlenecks in team workflows, increased regulation and compliance restraints, a lack of trust in model outputs, and ensuring peak model performance are all top of mind for AI-powered enterprises. Here's where Seldon helps Data Scientists, ML Engineers, and other stakeholders in the company to quickly and efficiently adopt machine learning to address these challenges. Founded in 2014, Seldon is a data science and machine learning operations platform that aims to empower Data Scientists, ML Engineers, and MLOps teams to deploy, monitor, explain, and manage their ML models. With Seldon, organisations can minimise risk and drastically cut down time-to-value from their models. The UK company offers both an open-source framework, "Core," which focuses on model deployment, and an enterprise product, "Deploy Advanced," which builds on this functionality to power model monitoring, explainability, and management. Seldon claims that it has achieved a 400% YoY growth rate in its open-source frameworks installed and running since its series A in November 2020. "Seldon has differentiated itself by presenting a unique solution that can reduce the friction for users deploying and explaining ML models across any industry.
MONAI: An open-source framework for deep learning in healthcare
Cardoso, M. Jorge, Li, Wenqi, Brown, Richard, Ma, Nic, Kerfoot, Eric, Wang, Yiheng, Murrey, Benjamin, Myronenko, Andriy, Zhao, Can, Yang, Dong, Nath, Vishwesh, He, Yufan, Xu, Ziyue, Hatamizadeh, Ali, Myronenko, Andriy, Zhu, Wentao, Liu, Yun, Zheng, Mingxin, Tang, Yucheng, Yang, Isaac, Zephyr, Michael, Hashemian, Behrooz, Alle, Sachidanand, Darestani, Mohammad Zalbagi, Budd, Charlie, Modat, Marc, Vercauteren, Tom, Wang, Guotai, Li, Yiwen, Hu, Yipeng, Fu, Yunguan, Gorman, Benjamin, Johnson, Hans, Genereaux, Brad, Erdal, Barbaros S., Gupta, Vikash, Diaz-Pinto, Andres, Dourson, Andre, Maier-Hein, Lena, Jaeger, Paul F., Baumgartner, Michael, Kalpathy-Cramer, Jayashree, Flores, Mona, Kirby, Justin, Cooper, Lee A. D., Roth, Holger R., Xu, Daguang, Bericat, David, Floca, Ralf, Zhou, S. Kevin, Shuaib, Haris, Farahani, Keyvan, Maier-Hein, Klaus H., Aylward, Stephen, Dogra, Prerna, Ourselin, Sebastien, Feng, Andrew
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
- Europe > United Kingdom > England > Greater London > London (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- (4 more...)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.93)
- Government > Regional Government > North America Government > United States Government (0.68)
Machine Learning with the Modern Data Stack: A Case Study
A lot has already been said about the modern data stack (MDS) but the situation is significantly more scattered on the machine learning side of the fence: once data is properly transformed, how is it consumed downstream to produce business value? This post is intended for anybody wanting to bridge the gap between working with data and actually delivering business value using machine learning. The modern data stack (MDS) has been consolidated as a series of best practices around data collection, storage and transformation. A lot has been said already about the MDS as such, but the situation is more "scattered" on the other side of the fence: once data is properly transformed, how is that consumed downstream to produce business value? At the end of the day, ingesting and transforming data is not (for most companies) an end in itself: while tech giants figured out a while ago how to "get models in production", most companies still struggle to productionize a model in less than 3 months.
Top 10 Open-Source AI Technologies Powering ML Projects in 2022
Artificial intelligence (AI) technologies are quickly transforming almost every sphere of our lives. From how we communicate to the means we use for transportation; we seem to be getting increasingly addicted to them. Because of these rapid advancements, massive amounts of talent and resources are dedicated to accelerating the growth of the technologies. Here are the top 10 open-source AI technologies powering ML projects in 2022. TensorFlow is an open-source machine learning framework that is easy to use and deploy across a variety of platforms.